Search CORE

9 research outputs found

Hercules Against Data Series Similarity Search

Author: Benbrahim Houda
Echihabi Karima
Fatourou Panagiota
Palpanas Themis
Zoumpatianos Kostas
Publication venue: 'VLDB Endowment'
Publication date: 26/12/2022
Field of study

We propose Hercules, a parallel tree-based technique for exact similarity search on massive disk-based data series collections. We present novel index construction and query answering algorithms that leverage different summarization techniques, carefully schedule costly operations, optimize memory and disk accesses, and exploit the multi-threading and SIMD capabilities of modern hardware to perform CPU-intensive calculations. We demonstrate the superiority and robustness of Hercules with an extensive experimental evaluation against state-of-the-art techniques, using many synthetic and real datasets, and query workloads of varying difficulty. The results show that Hercules performs up to one order of magnitude faster than the best competitor (which is not always the same). Moreover, Hercules is the only index that outperforms the optimized scan on all scenarios, including the hard query workloads on disk-based datasets. This paper was published in the Proceedings of the VLDB Endowment, Volume 15, Number 10, June 2022

arXiv.org e-Print Archive

ProS: Data Series Progressive k-NN Similarity Search and Classification with Probabilistic Quality Guarantees

Author: Bezerianos Anastasia
Echihabi Karima
Gogolou Anna
Palpanas Themis
Tsandilas Theophanis
Publication venue
Publication date: 26/12/2022
Field of study

Existing systems dealing with the increasing volume of data series cannot guarantee interactive response times, even for fundamental tasks such as similarity search. Therefore, it is necessary to develop analytic approaches that support exploration and decision making by providing progressive results, before the final and exact ones have been computed. Prior works lack both efficiency and accuracy when applied to large-scale data series collections. We present and experimentally evaluate ProS, a new probabilistic learning-based method that provides quality guarantees for progressive Nearest Neighbor (NN) query answering. We develop our method for k-NN queries and demonstrate how it can be applied with the two most popular distance measures, namely, Euclidean and Dynamic Time Warping (DTW). We provide both initial and progressive estimates of the final answer that are getting better during the similarity search, as well suitable stopping criteria for the progressive queries. Moreover, we describe how this method can be used in order to develop a progressive algorithm for data series classification (based on a k-NN classifier), and we additionally propose a method designed specifically for the classification task. Experiments with several and diverse synthetic and real datasets demonstrate that our prediction methods constitute the first practical solutions to the problem, significantly outperforming competing approaches. This paper was published in the VLDB Journal (2022)

arXiv.org e-Print Archive

Unleashing early maturity academic innovations

Author: Abdennadher Slim
Aly Sherif G.
Echihabi Karima
Tekli Joe
Publication venue: AUC Knowledge Fountain
Publication date: 01/04/2021
Field of study

The Arab region consists of many teaching-intensive universities that are intrinsically committed to holistic educational excellence. According to a recent UNESCO report, the higher education sector in the Arab region is undergoing a need for massive expansion given exponentially growing populations, record-breaking youth cohorts, coupled with a strong recognition of the economic and social value of higher education. Such an enormous need for growth poses a significant challenge for publicly funded universities yet offers many opportunities for private universities to meet the ever-increasing demands of advanced education. On another front, computing education is trending in the region with a reputation for high market demand, a certain future, and high pay

AUC Knowledge Fountain (American Univ. in Cairo)

ProS: data series progressive k-NN similarity search and classification with probabilistic quality guarantees

Author: Bezerianos Anastasia
Echihabi Karima
Gogolou Anna
Palpanas Themis
Tsandilas Theophanis
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 30/11/2022
Field of study

International audienceExisting systems dealing with the increasing volume of data series cannot guarantee interactive response times, even for fundamental tasks such as similarity search. Therefore, it is necessary to develop analytic approaches that support exploration and decision making by providing progressive results, before the final and exact ones have been computed. Prior works lack both efficiency and accuracy when applied to large-scale data series collections. We present and experimentally evaluate ProS, a new probabilistic learning-based method that provides quality guarantees for progressive Nearest Neighbor (NN) query answering. We develop our method for k-NN queries and demonstrate how it can be applied with the two most popular distance measures, namely, Euclidean and Dynamic Time Warping (DTW). We provide both initial and progressive estimates of the final answer that are getting better during the similarity search, as well suitable stopping criteria for the progressive queries. Moreover, we describe how this method can be used in order to develop a progressive algorithm for data series classification (based on a k-NN classifier), and we additionally propose a method designed specifically for the classification task. Experiments with several and diverse synthetic and real datasets demonstrate that our prediction methods constitute the first practical solutions to the problem, significantly outperforming competing approaches

HAL-CentraleSupelec

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL-Rennes 1

Fault-Tolerant Termination Detection with Safra’s Algorithm

Author: Echihabi Karima
Fokkink Wan
Fuchs Per
Karlos Georgios
Meyer Roland
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2021
Field of study

Safra’s distributed termination detection algorithm employs a logical token ring structure within a distributed network; only passive nodes forward the token, and a counter in the token keeps track of the number of sent minus the number of received messages. We adapt this classic algorithm to make it fault-tolerant. The counter is split into counters per node, to discard counts from crashed nodes. If a node crashes, the token ring is restored locally and a backup token is sent. Nodes inform each other of detected crashes via the token. Our algorithm imposes no additional message overhead, tolerates any number of crashes as well as simultaneous crashes, and copes with crashes in a decentralized fashion. Experiments with an implementation of our algorithm were performed on top of two fault-tolerant distributed algorithms

Data Series Progressive Similarity Search with Probabilistic Quality Guarantees

Author: Aßfalg Johannes
Brin Sergey
Chen Yihua
Ciaccia Paolo
Echihabi Karima
Echihabi Karima
Faloutsos Christos
Fekete Jean-Daniel
Gogolou Anna
Hellerstein Joseph M.
Keogh Eamonn
Kondylakis Haridimos
Micallef Luana
Palpanas Themis
Peng Botao
Rodrigues Pedro Pereira
Wand Matt P.
Wu Sai
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 14/06/2020
Field of study

International audienceExisting systems dealing with the increasing volume of data series cannot guarantee interactive response times, even for fundamental tasks such as similarity search. Therefore, it is necessary to develop analytic approaches that support exploration and decision making by providing progressive results, before the final and exact ones have been computed. Prior works lack both efficiency and accuracy when applied to large-scale data series collections. We present and experimentally evaluate a new probabilistic learning-based method that provides quality guarantees for progressive Nearest Neighbor (NN) query answering. We provide both initial and progressive estimates of the final answer that are getting better during the similarity search, as well suitable stopping criteria for the progressive queries. Experiments with synthetic and diverse real datasets demonstrate that our prediction methods constitute the first practical solution to the problem, significantly outperforming competing approaches

HAL-CentraleSupelec

Crossref

INRIA a CCSD electronic archive server

HAL-Rennes 1